Good and bad controls
Four elemental confounds
Start at treatment (X)
Look for any arrows coming INTO X
Follow all possible paths to outcome (Y)
A valid adjustment set blocks all backdoor paths
But be careful not to control for colliders!
Copy this code from the slides or the class book (Simulation 1: Simple Confounding).
Run the base simulation and observe results
Modify the simulation parameters:
#number of sims
N = 1000
# Generate data
U <- rnorm(N) # Unobserved confounder
X <- rnorm(N, mean = 0.5 * U) # Treatment affected by U
Y <- rnorm(N, mean = 0.8 * U) # Outcome affected by U
Z <- rnorm(N, mean = 0.6 * U) # Observed variable that captures U
d <- data.frame(X, Y, Z)
# Fit models
flist1 <- alist(
Y ~ dnorm(mu, sigma),
mu <- a + bX*X,
a ~ dnorm(0, .5),
bX ~ dnorm(0, .25),
sigma ~ dexp(1)
)
m32.1 <- quap(flist1, d)
precis(m32.1) mean sd 5.5% 94.5%
a -0.02554477 0.03869275 -0.08738326 0.03629372
bX 0.29078366 0.03536878 0.23425752 0.34730980
sigma 1.22711182 0.02741407 1.18329883 1.27092480
mean sd 5.5% 94.5%
a -0.008883186 0.03715079 -0.06825733 0.05049095
bX 0.232572852 0.03451483 0.17741149 0.28773421
bZ 0.307952523 0.03303311 0.25515924 0.36074580
sigma 1.176568843 0.02628656 1.13455785 1.21857984
“Bad controls” can create bias in three main ways:
Warning signs of bad controls:
Modify your code for this new simulation (precision parasite):
Test your previous models with these new data, using different sample sizes (n = 50, 100, 1000). For each sample size, compare:
How does sample size affect the impact of the precision parasite? Under what conditions is the precision loss most severe?
Modify your code for this new simulation (bias amplification):
Compare different confounder strengths (0.5, 1, 2).
Questions: * What happens to the bias when you control for Z? * How does the strength of the confounding affect the amount of bias amplification? * Can you explain why this happens using the DAG?
Modify your code to create a scenario with both a precision parasite variable and a bias amplification variable.
Questions: * What happens to our estimates when we control for both variables? * Is it better to: * Control for neither * Control for just one (which one?) Control for both How can we use DAGs to decide which controls to include?